perf(player): coalesce _mirrorParentMediaTime writes#396
Conversation
jrusso1020
left a comment
There was a problem hiding this comment.
Right design. Two-sample gate absorbs single-tick jitter (bridge interval, GC pause, throttled rAF) without thrashing currentTime, and the force: true escape hatch preserves zero-tolerance semantics at ownership promotion and proxy initialization — the two moments where a misaligned audible sample would be perceptually unacceptable.
Counter-reset-on-threshold-recovery is the critical invariant and it's exercised directly (a sample coming back within threshold clears the counter, so a later spike can't piggy-back on stale state). Similarly, out-of-range samples zero the counter — a proxy dropping out of its active window and returning later starts fresh.
The integration test (_promoteToParentProxy invokes _mirrorParentMediaTime with force: true) is the kind of call-site pin I'd want on a subtle perf contract like this: without it, a future refactor could remove { force: true } from the promotion path and every test still passes even though ownership flips would audibly drift.
Worth-calling-out-but-not-blocking: MIRROR_REQUIRED_CONSECUTIVE_DRIFT_SAMPLES = 2 is a constant — if bridge cadence ever changes (the comment cites bridgeMaxPostIntervalMs: 80, so worst-case correction latency is ~160ms), 2 may need to be re-tuned. Consider a doc-comment cross-reference so whoever touches the bridge cadence sees the coupling.
Approved.
— Rames Jusso
7600cd7 to
62e91f3
Compare
|
@jrusso1020 — thanks for the review. The non-blocking observation is now addressed:
Done in |
73bcd21 to
49c427b
Compare
ce9c0bb to
5989014
Compare
49c427b to
1a22a89
Compare
5989014 to
1b731fb
Compare
1a22a89 to
64bfffe
Compare
Per reviewer feedback on #396: surface the coupling between the mirror-loop's required-consecutive-drift-samples constant and the timeline-control bridge's max post interval, in both directions, so a future change to either side immediately points at the other. The coupling is: worst_case_correction_latency_ms ≈ MIRROR_REQUIRED_CONSECUTIVE_DRIFT_SAMPLES × bridgeMaxPostIntervalMs At today's values (2 × 80 ms = 160 ms) this stays under the perceptual A/V re-sync tolerance. Doc-only change — no behavior delta.
1b731fb to
b187b28
Compare
64bfffe to
bc1d71c
Compare
b187b28 to
7a3c263
Compare
7a3c263 to
f870a8f
Compare
Require two consecutive readings above MIRROR_DRIFT_THRESHOLD_SECONDS before writing back to el.currentTime, eliminating seek thrash from transient jitter on the parent timeline. Each _parentMedia entry gains a driftSamples counter; promoteToParentProxy and _onIframeMediaAdded pass force:true to keep one-shot alignments immediate. Adds 11 unit/integration tests covering jitter, trending drift, force override, out-of-range proxies, multi-proxy independence, and _promoteToParentProxy alignment.
Per reviewer feedback on #396: surface the coupling between the mirror-loop's required-consecutive-drift-samples constant and the timeline-control bridge's max post interval, in both directions, so a future change to either side immediately points at the other. The coupling is: worst_case_correction_latency_ms ≈ MIRROR_REQUIRED_CONSECUTIVE_DRIFT_SAMPLES × bridgeMaxPostIntervalMs At today's values (2 × 80 ms = 160 ms) this stays under the perceptual A/V re-sync tolerance. Doc-only change — no behavior delta.
292ec9e to
a6e14da
Compare
f870a8f to
2213f37
Compare
Merge activity
|
… drift (#400) ## Summary Second slice of `P0-1` from the player perf proposal: plugs the three steady-state scenarios — sustained playback FPS, scrub latency, and media-sync drift — into the perf gate that landed in #399. Adds the multi-video fixture they all share, wires three new shards into CI, and seeds one new baseline (`droppedFramesMax`). ## Why #399 stood up the harness and proved it with a single load-time scenario. By itself that's enough to catch regressions in initial composition setup, but it can't catch the things players actually fail at in production: - **FPS regressions** — a render-loop change that drops the ticker from 60 to 45 fps still loads fast. - **Scrub latency regressions** — the inline-vs-isolated split (#397) is exactly the kind of code path where a refactor can silently push everyone back to the postMessage round trip. - **Media drift** — runtime mirror logic (#396 in this stack) and per-frame scheduling tweaks can both cause video to slip out of sync with the composition clock without producing a single console error. Each of these is a target metric in the proposal with a concrete budget. This PR turns those budgets into gated CI signals and produces continuous data for them on every player/core/runtime change. ## What changed ### Fixture — `packages/player/tests/perf/fixtures/10-video-grid/` - `index.html`: 10-second composition, 1920×1080, 30 fps, with 10 simultaneously-decoding video tiles in a 5×2 grid plus a subtle GSAP scale "breath" on each tile (so the rAF/RVFC loops have real work to do without GSAP dominating the budget the decoder needs). - `sample.mp4`: small (~190 KB) clip checked in so the fixture is hermetic — no external CDN dependency, identical bytes on every run. - Same `data-composition-id="main"` host pattern as `gsap-heavy`, so the existing harness loader works without changes. ### `02-fps.ts` — sustained playback frame rate - Loads `10-video-grid`, calls `player.play()`, samples `requestAnimationFrame` callbacks inside the iframe for 5 s. - Crucial sequencing: install the rAF sampler **before** `play()`, wait for `__player.isPlaying() === true`, **then reset the sample buffer** — otherwise the postMessage round-trip ramp-up window drags the average down by 5–10 fps. - FPS = `(samples − 1) / (lastTs − firstTs in s)`; uses rAF timestamps (the same ones the compositor saw) rather than wall-clock `setTimeout`, so we're measuring real frame production. - Dropped-frame definition matches Chrome DevTools: gap > 1.5× (1000/60 ms) ≈ 25 ms = "missed at least one vsync." - Aggregation across runs: `min(fps)` and `max(droppedFrames)` — worst case wins, since the proposal asserts a floor on fps and a ceiling on drops. - Emits `playback_fps_min` (higher-is-better, baseline `fpsMin = 55`) and `playback_dropped_frames_max` (lower-is-better, baseline `droppedFramesMax = 3`). ### `04-scrub.ts` — scrub latency, inline + isolated - Loads `10-video-grid`, pauses, then issues 10 seek calls in two batches: first the synchronous **inline** path (`<hyperframes-player>`'s default same-origin `_trySyncSeek`), then the **isolated** path (forced by replacing `_trySyncSeek` with `() => false`, which makes the player fall back to the postMessage `_sendControl("seek")` bridge that cross-origin embeds and pre-#397 builds use). - Inline runs first so the isolated mode's monkey-patch can't bleed back into the inline samples. - Detection: a rAF watcher inside the iframe polls `__player.getTime()` until it's within `MATCH_TOLERANCE_S = 0.05 s` of the requested target. Tolerance exists because the postMessage bridge converts seconds → frame number → seconds, and that round-trip can introduce sub-frame quantization drift even for targets on the canonical fps grid. - Timing: `performance.timeOrigin + performance.now()` in both contexts. `timeOrigin` is consistent across same-process frames, so `t1 − t0` is a true wall-clock latency, not a host-only or iframe-only stopwatch. - Targets alternate forward/backward (`1.0, 7.0, 2.0, 8.0, 3.0, 9.0, 4.0, 6.0, 5.0, 0.5`) so no two consecutive seeks land near each other — protects the rAF watcher from matching against a stale `getTime()` value before the seek command is processed. - Aggregation: `percentile(95)` across the pooled per-seek latencies from every run. With 10 seeks × 2 modes × 3 runs we get 30 samples per mode per CI shard, enough for a stable p95. - Emits `scrub_latency_p95_inline_ms` (lower-is-better, baseline `scrubLatencyP95InlineMs = 33`) and `scrub_latency_p95_isolated_ms` (lower-is-better, baseline `scrubLatencyP95IsolatedMs = 80`). ### `05-drift.ts` — media sync drift - Loads `10-video-grid`, plays 6 s, instruments **every** `video[data-start]` element with `requestVideoFrameCallback`. Each callback records `(compositionTime, actualMediaTime)` plus a snapshot of the clip transform (`clipStart`, `clipMediaStart`, `clipPlaybackRate`). - Drift = `|actualMediaTime − ((compTime − clipStart) × clipPlaybackRate + clipMediaStart)|` — the same transform the runtime applies in `packages/core/src/runtime/media.ts`, snapshotted once at sampler install so the per-frame work is just subtract + multiply + abs. - Sustain window is 6 s (not the proposal's 10 s) because the fixture composition is exactly 10 s long and we want headroom before the end-of-timeline pause/clamp behavior. With 10 videos × ~25 fps × 6 s we still pool ~1500 samples per run — more than enough for a stable p95. - Same "reset buffer after play confirmed" gotcha as `02-fps.ts`: frames captured during the postMessage round-trip would compare a non-zero `mediaTime` against `getTime() === 0` and inflate drift by hundreds of ms. - Aggregation: `max()` and `percentile(95)` across the pooled per-frame drifts. The proposal's max-drift ceiling of 500 ms is intentional — the runtime hard-resyncs when `|currentTime − relTime| > 0.5 s`, so a regression past 500 ms means the corrective resync kicked in and the viewer saw a jump. - Emits `media_drift_max_ms` (lower-is-better, baseline `driftMaxMs = 500`) and `media_drift_p95_ms` (lower-is-better, baseline `driftP95Ms = 100`). ### Wiring - `packages/player/tests/perf/index.ts`: add `fps`, `scrub`, `drift` to `ScenarioId`, `DEFAULT_RUNS`, the default scenario list (`--scenarios` defaults to all four), and three new dispatch branches. - `packages/player/tests/perf/perf-gate.ts`: add `droppedFramesMax: number` to `PerfBaseline`. Other baseline keys for these scenarios were already seeded in #399. - `packages/player/tests/perf/baseline.json`: add `droppedFramesMax: 3`. - `.github/workflows/player-perf.yml`: three new matrix shards (`fps` / `scrub` / `drift`) at `runs: 3`. Same `paths-filter` and same artifact-upload pattern as the `load` shard, so the summary job aggregates them automatically. ## Methodology highlights These three patterns recur in all three scenarios and are worth noting because they're load-bearing for the numbers we report: 1. **Reset buffer after play-confirmed.** The `play()` API is async (postMessage), so any samples captured before `__player.isPlaying() === true` belong to ramp-up, not steady-state. Both `02-fps` and `05-drift` clear `__perfRafSamples` / `__perfDriftSamples` *after* the wait. Without this, fps drops 5–10 and drift inflates by hundreds of ms. 2. **Iframe-side timing.** All three scenarios time inside the iframe (`performance.timeOrigin + performance.now()` for scrub, rAF/RVFC timestamps for fps/drift) rather than host-side. The iframe is what the user sees; host-side timing would conflate Puppeteer's IPC overhead with real player latency. 3. **Stop sampling before pause.** Sampler is deactivated *before* `pause()` is issued, so the pause command's postMessage round-trip can't perturb the tail of the measurement window. ## Test plan - [x] Local: `bun run player:perf` runs all four scenarios end-to-end on the 10-video-grid fixture. - [x] Each scenario produces metrics matching its declared `baselineKey` so `perf-gate.ts` can find them. - [x] Typecheck, lint, format pass on the new files. - [x] Existing player unit tests untouched (no production code changes in this PR). - [ ] First CI run will confirm the new shards complete inside the workflow timeout and that the summary job picks up their `metrics.json` artifacts. ## Stack Step `P0-1b` of the player perf proposal. Builds on: - `P0-1a` (#399): the harness, runner, gate, and CI workflow this PR plugs new scenarios into. Followed by: - `P0-1c` (#401): `06-parity` — live playback frame vs. synchronously-seeked reference frame, compared via SSIM, on the existing `gsap-heavy` fixture from #399.
## Summary Adds **scenario 06: live-playback parity** — the third and final tranche of the P0-1 perf-test buildout (`p0-1a` infra → `p0-1b` fps/scrub/drift → this). The scenario plays the `gsap-heavy` fixture, freezes it mid-animation, screenshots the live frame, then synchronously seeks the same player back to that exact timestamp and screenshots the reference. The two PNGs are diffed with `ffmpeg -lavfi ssim` and the resulting average SSIM is emitted as `parity_ssim_min`. Baseline gate: **SSIM ≥ 0.95**. This pins the player's two frame-production paths (the runtime's animation loop vs. `_trySyncSeek`) to each other visually, so any future drift between scrub and playback fails CI instead of silently shipping. ## Motivation `<hyperframes-player>` produces frames two different ways: 1. **Live playback** — the runtime's animation loop advances the GSAP timeline frame-by-frame. 2. **Synchronous seek** (`_trySyncSeek`, landed in #397) — for same-origin embeds, the player calls into the iframe runtime's `seek()` directly and asks for a specific time. These paths must agree. If they don't — different rounding, different sub-frame sampling, different state ordering — scrubbing a paused composition shows different pixels than a paused-during-playback frame at the same time. That's a class of bug that only surfaces visually, never in unit tests, and only at specific timestamps where many things are mid-flight. `gsap-heavy` is a 10s composition with 60 tiles each running a staggered 4s out-and-back tween. At t=5.0s a large fraction of those tiles are mid-flight, so the rendered frame has many distinct, position-sensitive pixels — the worst-case input for any sub-frame disagreement. If the two paths produce identical pixels here, they'll produce identical pixels everywhere that matters. ## What changed - **`packages/player/tests/perf/scenarios/06-parity.ts`** — new scenario (~340 lines). Owns capture, seek, screenshot, SSIM, artifact persistence, and aggregation. - **`packages/player/tests/perf/index.ts`** — register `parity` as a scenario id, default-runs = 3, dispatch to `runParity`, include in the default scenario list. - **`packages/player/tests/perf/perf-gate.ts`** — extend `PerfBaseline` with `paritySsimMin`. - **`packages/player/tests/perf/baseline.json`** — `paritySsimMin: 0.95`. - **`.github/workflows/player-perf.yml`** — add a `parity` shard (3 runs) to the matrix alongside `load` / `fps` / `scrub` / `drift`. ## How the scenario works The hard part is making the two captures land on the *exact same timestamp* without trusting `postMessage` round-trips or arbitrary `setTimeout` settling. 1. **Install an iframe-side rAF watcher** before issuing `play()`. The watcher polls `__player.getTime()` every animation frame and, the first time `getTime() >= 5.0`, calls `__player.pause()` *from inside the same rAF tick*. `pause()` is synchronous (it calls `timeline.pause()`), so the timeline freezes at exactly that `getTime()` value with no postMessage round-trip. The watcher's Promise resolves with that frozen value as the canonical `T_actual` for the run. 2. **Confirm `isPlaying() === true`** via `frame.waitForFunction` before awaiting the watcher. Without this, the test can hang if `play()` hasn't kicked the timeline yet. 3. **Wait for paint** — two `requestAnimationFrame` ticks on the host page. The first flushes pending style/layout, the second guarantees a painted compositor commit. Same paint-settlement pattern as `packages/producer/src/parity-harness.ts`. 4. **Screenshot the live frame** — `page.screenshot({ type: "png" })`. 5. **Synchronously seek to `T_actual`** — call `el.seek(capturedTime)` on the host page. The player's public `seek()` calls `_trySyncSeek` which (same-origin) calls `__player.seek()` synchronously, so no postMessage await is needed. The runtime's deterministic `seek()` rebuilds frame state at exactly the requested time. 6. **Wait for paint** again, screenshot the reference frame. 7. **Diff with ffmpeg** — `ffmpeg -hide_banner -i reference.png -i actual.png -lavfi ssim -f null -`. ffmpeg writes per-channel + overall SSIM to stderr; we parse the `All:` value, clamp at 1.0 (ffmpeg occasionally reports 1.000001 on identical inputs), and treat it as the run's score. 8. **Persist artifacts** under `tests/perf/results/parity/run-N/` (`actual.png`, `reference.png`, `captured-time.txt`) so CI can upload them and so a failed run is locally reproducible. Directory is already gitignored via the existing `packages/player/tests/perf/results/` rule. ### Aggregation `min()` across runs, **not** mean. We want the *worst observed* parity to pass the gate so a single bad run can't get masked by averaging. Both per-run scores and the aggregate are logged. ### Output metric | name | direction | baseline | |-------------------|------------------|----------------------| | `parity_ssim_min` | higher-is-better | `paritySsimMin: 0.95` | With deterministic rendering enabled in the runner, identical pixels produce SSIM very close to 1.0; the 0.95 threshold leaves headroom for legitimate fixture-level noise (font hinting, GPU compositor variance) while still catching any real disagreement between the two paths. ## Test plan - `bun run player:perf -- --scenarios=parity --runs=3` locally on `gsap-heavy` — passes with SSIM ≈ 0.999 across all 3 runs. - Inspected `results/parity/run-1/actual.png` and `reference.png` side-by-side — visually identical. - Inspected `captured-time.txt` to confirm `T_actual` lands just past 5.0s (within one frame). - Sanity test: temporarily forced a 1-frame offset between live and reference capture; SSIM dropped well below 0.95 as expected, confirming the threshold catches real drift. - CI: `parity` shard added alongside the existing `load` / `fps` / `scrub` / `drift` shards; same `measure`-mode / artifact-upload / aggregation flow. - `bunx oxlint` and `bunx oxfmt --check` clean on the new scenario. ## Stack This is the top of the perf stack: 1. #393 `perf/x-1-emit-performance-metric` — performance.measure() emission 2. #394 `perf/p1-1-share-player-styles-via-adopted-stylesheets` — adopted stylesheets 3. #395 `perf/p1-2-scope-media-mutation-observer` — scoped MutationObserver 4. #396 `perf/p1-4-coalesce-mirror-parent-media-time` — coalesce currentTime writes 5. #397 `perf/p3-1-sync-seek-same-origin` — synchronous seek path (the path this PR pins) 6. #398 `perf/p3-2-srcdoc-composition-switching` — srcdoc switching 7. #399 `perf/p0-1a-perf-test-infra` — server, runner, perf-gate, CI 8. #400 `perf/p0-1b-perf-tests-for-fps-scrub-drift` — fps / scrub / drift scenarios 9. **#401 `perf/p0-1c-live-playback-parity-test` ← you are here** With this PR landed the perf harness covers all five proposal scenarios: `load`, `fps`, `scrub`, `drift`, `parity`.

Summary
Coalesce writes to
el.currentTimeinside_mirrorParentMediaTimeso a single jitter sample no longer triggers a parent-media seek. A drift correction now requires two consecutive samples above the threshold (~MIRROR_DRIFT_THRESHOLD_SECONDS) before the player writes back. One-shot alignment paths (promoteToParentProxy,_onIframeMediaAdded) opt out viaforce: trueso initial alignment stays immediate.Why
Step
P1-4of the player perf proposal._mirrorParentMediaTimeis called every animation frame on parent media proxies. Even without true drift, browser internals report tiny jitter oncurrentTimereads — typically below 30 ms but occasionally crossing the threshold for a frame. Writing tocurrentTimetriggers a seek, which is expensive and invalidates pipeline buffers, which causes the next frame's reading to jitter further. The result was unnecessary seek thrash on otherwise-aligned media.By requiring two consecutive over-threshold samples, transient jitter is filtered out while real drift (a sustained offset) still corrects within ~1 frame of latency. This eliminates the most common cause of dropped frames on the studio thumbnail grid.
What changed
_parentMediaentry gains adriftSamplescounter that increments while the absolute drift is aboveMIRROR_DRIFT_THRESHOLD_SECONDSand resets to 0 on the first sample below._mirrorParentMediaTime(el, opts)only writes back whendriftSamples >= 2, except whenopts.force === true.promoteToParentProxyand_onIframeMediaAddedpassforce: trueso the first alignment after registration is still immediate (these are user-visible state transitions, not steady-state telemetry).Test plan
hyperframes-player.test.tscovering:force: trueoverride bypasses the sample requirement._promoteToParentProxyalignment is immediate.Stack
Step
P1-4of the player perf proposal. Builds onP1-1(shared adopted stylesheets) andP1-2(scoped media observer). Together these three target the studio multi-player render path —P0-1*perf gate scenarios will pick up the wins automatically.